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Purpose: To describe and mathematically validate the superiorization methodol- 
ogy, which is a recently-developed heuristic approach to optimization, and to discuss 
its applicability to medical physics problem formulations that specify the desired 
solution (of physically given or otherwise obtained constraints) by an optimization 
criterion. 

Methods: The superiorization methodology is presented as a heuristic solver for 
a large class of constrained optimization problems. The constraints come from the 
desire to produce a solution that is constraints-compatible, in the sense of meeting 
requirements provided by physically or otherwise obtained constraints. The underly- 
ing idea is that many iterative algorithms for finding such a solution are perturbation 
resilient in the sense that, even if certain kinds of changes are made at the end of 
each iterative step, the algorithm still produces a constraints-compatible solution. 
This property is exploited by using permitted changes to steer the algorithm to a 
solution that is not only constraints-compatible, but is also desirable according to 
a specified optimization criterion. The approach is very general, it is applicable to 
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29 many iterative procedures and optimization criteria used in medical physics. 

30 Results: The main practical contribution is a procedure for automatically pro- 

31 during from any given iterative algorithm its superiorized version, which will supply 

32 solutions that are superior according to a given optimization criterion. It is shown 

33 that if the original iterative algorithm satisfies certain mathematical conditions, then 

34 the output of its superiorized version is guaranteed to be as constraints-compatible as 

35 the output of the original algorithm, but it is superior to the latter according to the 

36 optimization criterion. This intuitive description is made precise in the paper and 

37 the stated claims are rigorously proved. Superiorization is illustrated on simulated 

38 computerized tomography data of a head cross-section and, in spite of its general- 

39 ity, superiorization is shown to be competitive to an optimization algorithm that is 

40 specifically designed to minimize total variation. 

41 Conclusions: The range of applicability of superiorization to constrained opti- 

42 mization problems is very large. Its major utility is in the automatic nature of 

43 producing a superiorization algorithm from an algorithm aimed at only constraints- 

44 compatibility; while non-heuristic (exact) approaches need to be redesigned for a new 

45 optimization criterion. Thus superiorization provides a quick route to algorithms for 

46 the practical solution of constrained optimization problems. 

47 Keywords: superiorization, constrained optimization, heuristic optimization, tomography, 

48 total variation 



49 I. INTRODUCTION 

50 Optimization is a tool that is used in many areas of Medical Physics. Prime examples are 

51 radiation therapy treatment planning and tomographic reconstruction, but there are others 

52 such as image registration. Some well-cited classical publications on the topic are ~ and 

53 some recent articles are 13-26 . 

54 In a typical medical physics application, one uses constrained optimization, where the 

55 constraints come from the desire to produce a solution that is constraints- compatible, in 

56 the sense of meeting the requirements provided by physically or otherwise obtained con- 

57 straints. In radiation therapy treatment planning, the requirements are usually in the form 
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58 of constraints prescribed by the treatment planner on the doses to be delivered at specific 

59 locations in the body. These doses in turn depend on information provided by an imaging 

60 instrument, typically a Magnetic Resonance Imaging (MRI) or a Computerized Tomogra- 

61 phy (CT) scanner. In tomography, the constraints come from the detector readings of the 

62 instrument. In such applications, it is typically the case that a large number of solutions 

63 would be considered good enough from the point of view of being constraints-compatible; 

64 to a large extent, but not entirely, due to the fact that there is uncertainty as to the exact 

65 nature of the constraints (for example, due to noise in the data collection). In such a case, 

66 an optimization criterion is introduced that helps us to distinguish the "better" constraints- 

67 compatible solutions (for example, this criterion could be the total dose to be delivered to 

68 the body, which may vary quite a bit between radiation therapy treatment plans that are 

69 compatible with the constraints on the doses delivered to individual locations). 

70 The superiorization methodology (see, for example, 22,27-32 ) is a recently-developed heuris- 

71 tic approach to optimization. The word heuristic is used here in the sense that the process 

72 is not guaranteed to lead to an optimum according to the given criterion; approaches aimed 

73 at processes that are guaranteed in that sense are usually referred to as exact. Heuristic 

74 approaches have been found useful in practical applications of optimization, mainly because 

75 they are often computationally much less expensive than their exact counterparts, but nev- 

76 ertheless provide solutions that are appropriate for the application at hand 33-35 . 

77 The underlying idea of the superiorization approach is the following. In many applica- 

78 tions there exists a computationally-efficient iterative algorithm that produces a constraints- 

79 compatible solution for the given constraints. (An example of this for radiation therapy 

80 treatment planning is reported in 36 , its clinical use is discussed in 15 .) Furthermore, often 

81 the algorithm is perturbation resilient in the sense that, even if certain kinds of changes are 

82 made at the end of each iterative step, the algorithm still produces a constraints-compatible 

83 solution 2 ' -30 . This property is exploited in the superiorization approach by using such per- 

84 turbations to steer the algorithm to a solution that is not only constraints-compatible, but is 

85 also desirable according to a specified optimization criterion. The approach is very general, 

86 it is applicable to many iterative procedures and optimization criteria. 

87 The current paper presents a major advance in the practice and theory of superiorization. 

88 The previous publications 22 ' 27-32 used the intuitive idea to present some superiorization 

89 algorithms, in this paper the reader will find a totally automatic procedure that turns an 
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90 iterative algorithm into its superiorized version. This version will produce an output that 

91 is as constraints-compatible as the output of the original algorithm, but it is superior to 

92 that according to an optimization criterion. This claim is mathematically shown to be 

93 true for a very large class of iterative algorithms and for optimization criteria in general, 

94 typical restrictions (such as convexity) on the optimization criterion are not essential for 

95 the material presented below. In order to make precise and validate this broad claim, we 

96 present here a new theoretical framework. The framework of 29 is a precursor of what we 

97 present here, but it is a restricted one, since it assumes that the constraints can be all 

98 satisfied simultaneously, which is often false in medical physics applications. There is no 

99 such restriction in the presentation below. 

100 The idea of designing algorithms that use interlacing steps of two different kinds (in our 

101 case, one kind of steps aim at constraints-compatibility and the other kind of steps aim at 

102 improvement of the optimization criterion) is well-established and, in fact, is made use of 

103 in many approaches that have been proposed with exact constrained optimization in mind; 

104 see, for example, the works of Helou Neto and De Pierro ' 38 , of Nurminski , of Combettes 

105 and coworkers 10 ' 11 , of Sidky and Pan and coworkers 23 ' 42 ' 43 and of Defrise and coworkers 44 . 

106 However, none of these approaches can do what can be done by the superiorization approach 

107 as presented below, namely the automatic production of a heuristic constrained optimization 

108 algorithm from an iterative algorithm for constraints-compatibility. For example, in it is 

109 assumed (just as in the theory presented in our 29 ) that all the constraints can be satisfied 

110 simultaneously. 

111 A major motivator for the additional theory presented in the current paper is to get rid 

112 of this assumption, which is not reasonable when handling real problems of medical physics. 

113 Motivated by similar considerations, Helou Neto and De Pierro present an alternative 

114 approach that does not require this unreasonable assumption. However, in order to solve 

115 such a problem, they end up with iterative algorithms of a particular form rather than having 

116 the generality of being able to turn any constraints-compatibility seeking algorithm into a 

117 superiorized one capable of handling constrained optimization. Also, the assumptions they 

118 have to make in order to prove their convergence result (their Theorem 15) indicate that 

119 their approach is applicable to a smaller class of constrained optimization problems than 

120 the superiorization approach whose applicability seems to be more general. However, for 

121 the mathematical purist, we point out that they present an exact constrained optimization 
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122 algorithm, while superiorization is a heuristic approach. Whether this is relevant to medical 

123 physics practice is not clear: exact algorithms are not run forever, but are stopped according 

124 to some stopping-rule, the relevant questions in comparing two algorithms are the quality 

125 of the actual output and the computation time needed to obtain it. 

126 Ultimately, the quality of the outputs should be evaluated by some figures of merit 

127 relevant to the medical task at hand. An example of a careful study of this kind that 

128 involves superiorization is in 30 (Section 4.3), which reports on comparing in CT the efficacy 

129 of constrained optimization reconstruction algorithms for the detection of low-contrast brain 

130 tumors by using the method of statistical hypothesis testing (which provides a P-value that 

131 indicates the significance by which we can reject the null hypothesis that the two algorithms 

132 are equally efficacious in favor of the alternative that one is preferable). Such studies bundle 

133 together two things: (i) the formulation of the constrained optimization task and (ii) the 

134 performance of the algorithm in performing that task. The first of these requires a translation 

135 of the medical aim into a mathematical model, it is important that this model should be 

136 appropriately chosen. 

137 The superiorization approach is not about choosing this model, it kicks in once the model 

138 is chosen and aims at producing an output that is "good" according to the mathematical 

139 specifications of the constraints and of the optimization criterion. Thus superiorization has 

140 been used to compare the effects on the quality of the output in CT when the optimization 

141 criterion is specified by total variation (TV) versus by entropy or versus by the ^-norm 

142 of the Haar transform 32 . However, the current paper is not about discussing how to trans- 

143 late the underlying medical physics task into a constrained optimization problem. For our 

144 purposes here, we are assuming that the mathematical model has been worked out and 

145 concentrate on the algorithmic approach for solving the resulting constrained optimization 

146 problem. We claim that the evaluation of such algorithms should not be based on the 

147 medical figures of merit mentioned at the beginning of the previous paragraph, but rather 

148 on their performance in solving the mathematical problem. If "good" solutions to the con- 

149 strained optimization problem are not medically efficacious, that indicates that something is 

150 wrong with the mathematical model and not that something is wrong with the algorithmic 

151 approach. For this reason, in this paper we will not carry out a careful investigation of the 

152 medical efficacy of any algorithm in the manner that we have done in 30 (Section 4.3), but will 

153 restrict ourselves to a simple illustration of the performance of the superiorization approach 
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154 as compared to the previously published algorithm of 12 that is aimed at performing exact 

155 minimization. 

156 Examples of such studies already exist. Superiorization was compared in 2 ' with Algorithm 

157 6 of 10 and in with the algorithm of Goldstein and Osher that they refer to as TwIST with 

158 split Bregman as the substep. In both cases the implementation was done by the proposers 

159 of the algorithms. In these reported instances superiorization did well: the constraints- 

160 compatibility and the value of the function to be minimized were very similar for the outputs 

161 produced by the algorithms being compared, but the superiorization algorithm produced its 

162 output four times faster than the alternative. It would be unjustified to draw any general 

163 conclusions on the mathematical performance and speed of superiorization based on just a 

164 few experiments, but the reported results are encouraging. 

165 However, the main reason why we advocate superiorization is different from what is 

166 discussed above. The reason why we claim it to be helpful in medical physics research is 

167 that it has the potential of saving a lot of time and effort for the researcher. Let us consider 

168 a historical example. Likelihood optimization using the iterative process of expectation 

169 maximization (EM) gained immediate and wide acceptance in the emission tomography 

170 community. It was observed that irregular high amplitude patterns occurred in the image 

171 with a large number of iterations, but it was not until five years later that this problem 

172 was corrected 19 by the use of a maximum a posteriority probability (MAP) algorithm with 

173 a multivariate Gaussian prior. Had we had at our disposal the superiorization approach, 

174 then the introduction of an optimization criterion (Gaussian or other) into the iterative 

175 expectation maximization (EM) process would have been a simple matter and we would 

176 have saved the time and effort spent on designing a special purpose algorithm for the MAP 

177 formulation. A 7V-superiorization of the EM algorithm is presented in 50 . 

178 Even though our major claim for superiorization is that it provides a quick route to 

179 algorithms for the practical solution of constrained optimization problems, before leaving 

180 this introduction let us bring up a question that has to do with the performance of the 

181 resulting algorithms: Will superiorization produce superior results to those produced by 

182 contemporary MAP methods or is it faster than the better of such methods? At this stage 

183 we have not yet developed the mathematical notation to discuss this question in a rigorous 

184 manner, we return to it in Subsection II F. 

185 In the next section we present in detail the superiorization methodology. In the subse- 
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186 quent section we provide an illustrative example by reporting on reconstructions produced 

187 by algorithms applied to simulated computerized tomography data of a head cross-section. 

188 In the final section we discuss our results and present our conclusions. 

189 II. THE SUPERIORIZATION METHODOLOGY 

190 A. Problem sets, proximity functions and e-compatibility 

191 Although optimization is often studied in a more general context (such as in Hilbert or 

192 Banach spaces), in medical physics we usually deal with a special case, where optimization 

193 is performed in a Euclidean space M. J (the space of J-dimensional vectors of real numbers, 

194 where J is a positive integer). As often appropriate in practice, we further restrict the 

195 domain of optimization to a nonempty subset Q of IR J (such as the nonnegative orthant M.^_ 

196 that consists of vectors all of whose components are nonnegative). 

197 We now turn to formalizing the notion of being compatible with given constraints, a 

198 notion that we have used informally in the previous section. In any application, there is a 

199 problem set T; each problem T E T is essentially a description of the constraints in that 

200 particular case. For example, for a tomographic scanner, the problem of reconstruction for 

201 a particular patient at a particular time is determined by the measurements taken by the 

202 scanner for that patient at that time. The intuitive notion of constraints-compatibility is 

203 formalized by the use of a proximity function Vr on T such that, for every T G T, Vr^ 

204 maps Q into R+, the set of nonnegative real numbers; i.e., Vrx : —> R+. Intuitively we 

205 think of Vr T (x) as an indicator of how incompatible x is with the constraints of T. For 

206 example, in tomography, Vr T (x) should indicate by how much a proposed reconstruction 

207 that is described by an x in il violates the constraints of the problem T that are provided 

208 by the measurements taken by the scanner. For example, if we use b to denote the vector 

209 of estimated line integrals based on the measurements obtained by the scanner and by A 

210 the system matrix of the scanner, then a possible choice for the proximity function is the 

211 norm-distance \\b — Ax\\, which we will use as an example in the discussions that follow. 

212 An alternative legitimate choice for the proximity function is the Kullback-Leibler distance 

213 KL(b, Ax), which is the negative log-likelihood of a statistical model in tomography. The 

214 special case VrT (x) = is interpreted by saying that x is perfectly compatible with the 
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215 constraints; due to the presence of noise in practical applications, it is quite conceivable 

216 that there is no x that is perfectly compatible with the constraints, and we accept an x 

217 as constraints-compatible as long as the value of Vr? (ce) is considered to be small enough 

218 to justify that decision. Combining these two concepts leads to the notion of a problem 

219 structure, which is a pair (T, Vr), where T is a nonempty problem set and Vr is a proximity 

220 function on T. For a problem structure (T, Vr), a problem T G T, a nonnegative e and an 

221 x G Q, we say that x is e-compatible with T provided that Vr? (x) < e. 

222 As an example (whose applicability to tomographic reconstruction is illustrated in Section 

223 III), consider the problem structure that arises from the desire to find nonnegative solutions 

224 of sequences of blocks of linear equations. Then the appropriate choices are Q, = R{ and 

225 the problem structure is (§, Res), where the problem set § is 

§ = {({(aS&O,...,^, ft*)},..., 

{ (a^+-+^+\ b h+ ... +iw _ 1+1 ) (a^-^,b £l+ ... +ew ) }) | 
W is a positive integer and, (1) 
for 1 < w < W, £ w is a positive integer and, 
for 1 < i < i x + . . . + £ w , a 1 G R J and b { G E} 

226 and the proximity function Res on § is defined, for any problem S = ({(a 1 ,^), 

227 ...,(a e \b tl ) },..., {(a^+"- + ^- + S6 €l+ ... +v _ 1+1 ),...,(a^+-"+^,6, 1+ ... + ^)}) in § and 

228 for any x G f2, by 



Res s (x 



\ 



J2 (k-{a<,x)f. (2) 

1=1 

229 Note that each element of this problem set § specifies an ordered sequence of W blocks 

230 of linear equations of the form {a l ,x) = bi where (*, *) denotes the inner product in IR J 

231 (and thus § is an appropriate representation of the so-called "ordered subsets" approach to 

232 tomographic reconstruction ' 1 , as well as of other earlier-published block-iterative methods 

233 that proposed essentially the same idea 52-54 ). The proximity function Res on S is the residual 

234 that we get when a particular x is substituted into all the equations of a particular problem 

235 S. 
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236 B. Algorithms and outputs 

237 We now define the concept of an algorithm in the general context of problem structures. 

238 For technical reasons that will become clear as we proceed with our development, we intro- 

239 duce an additional set A, such that C A C M J . (Both Q and A are assumed to be known 

240 and fixed for any particular problem structure (T, Vr).) An algorithm P for a problem 

241 structure (T, Vr) assigns to each problem T G T an operator : A — > Q. This definition is 

242 used to define iterative processes that, for any initial point x G Q, produce the (potentially) 

243 infinite sequence I (Pt) x ) (that is, the sequence x, Pt*, Pt (Pt x ) > " " ) °f points in 

V / k=0 

244 Q. We discuss below how such a potentially infinite process is terminated in practice. 

245 Selecting Q = and A = M J for the problem structure (§, Res) of the previous subsec- 

246 tion, an example of an algorithm R is specified by 

R sS = QB % --B Sia;i (3) 

247 where 5* is the problem specified above (2) and, for 1 < w < W, : A — > A is defined by 

04 — (a , x) ■ 

B Sw x = x + r II J ° » W 

248 where ||a|| denotes the norm of the vector a in M J , and Q : A — > Q is defined by 

(Qx) j = max{0, Xj} , for 1 < j < J. (5) 

249 Note that R5 : A — > Q. This specific algorithm R is a typical example of the so-called block- 

250 iterative methods mentioned above. Except for the presence of Q in (3), which enforces 

251 nonnegativity of the components, it is identical to an algorithm used and illustrated in 31 . 

252 With the Q absent from the definition of the algorithm, Q has to be the whole of IR J ; the 

253 practical consequence of the presence versus the absence of Q in the tomographic application 

254 is illustrated in Subsection HID. We note also that special cases of the presented algorithm 

255 include the classical reconstruction methods ART (if £ w = 1, for 1 < w < W) and SIRT (if 

256 W = 1); see, for example, Chapters 11 and 12 of 55 . 

257 For a problem structure (T, Vr), a T G T, an e G IR+ and a sequence 

258 of points in Q, we use O (T, e, R) to denote the x G Q that has the following properties: 

259 Vtt{x) < s and there is a nonnegative integer K such that x K = x and, for all nonnegative 

260 integers k < K, Vr^ (a? fc ) > e. Clearly, if there is such an x, then it is unique. If there is no 
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261 such x, then we say that O (T,e,R) is undefined, otherwise we say that it is defined. The 

262 intuition behind this definition is the following: if we think of R as the (infinite) sequence of 

263 points that is produced by an algorithm (intended for the problem T) without a termination 

264 criterion, then O (T, e, R) is the output produced by that algorithm when we add to it 

265 instructions that make it terminate as soon as it reaches a point that is e-compatible with 

266 T. 

267 C. Bounded perturbation resilience 

268 The notion of a bounded perturbations resilient algorithm P for a problem structure 

269 (T,Vr) has been defined in a mathematically precise manner 29 . However, that definition 

270 is not satisfactory from the point of view of applications in medical physics (or indeed in 

271 any area involving noisy data), because it is useful only for problems T for which there is 

272 a perfectly compatible solution (that is, an x such that Vtt (x) = 0). We therefore extend 

273 here that notion as follows. An algorithm P for a problem structure (T, Vr) is said to be 

274 strongly perturbation resilient if, for all T G T, 

275 (i) there exists an e G K+ such that O (t, e, ^(Pt)^ x J J is defined for every x G f2; 

276 (ii) for all e G R + such that O \ T,e, ^(Pr) fe a;j j is defined for every x G Q, we also 

277 have that O (T, e' , R) is defined for every s' > e and for every sequence R = [x k ^_ Q 

278 of points in Q generated by 

x k+1 = P T (x k + (3 k v k ) , for all k > 0, (6) 

279 where fikV k are bounded perturbations, meaning that the sequence (/3fc)^L of nonnega- 

oo 

280 tive real numbers is summable (that is, < oo), the sequence (u fc )^l of vectors 

fc=0 

281 in R J is bounded and, for all k > 0, x k + (i k v k G A. 

282 In less formal terms, the second of these properties says that for a strongly perturbation 

283 resilient algorithm we have that, for every problem and any nonnegative real number e, if it 

284 is the case that for all initial points from Q the infinite sequence produced by the algorithm 

285 contains an ^-compatible point, then it will also be the case that all perturbed sequences 

286 satisfying (6) contain an e'-compatible point, for any e' > e. 
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287 Having denned the notion of a strongly perturbation resilient algorithm, we next show 

288 that this notion is of relevance to problems in medical physics. We illustrate the use of this 

289 in tomography in the next section. We first need to introduce some mathematical concepts. 

290 Given an algorithm P for a problem structure (T, Vr) and a T G T, we say that 

291 P is convergent for T if, for every x G Q, there exists a unique y (x) G f2 such that, 

292 Hindoo (Pr) k x = y (x), meaning that for every positive real number 5, there exist a non- 



293 negative integer K, such that 



< 5, for all nonnegative integers k > K. 



(P T ) k x - y (x) 

294 If, in addition, there exists a 7 G IR+ such that Vr? (y (x)) < 7, for every x G Q, then we 

295 say that P is boundedly convergent for T. 

296 A function / : $7 — Y R is uniformly continuous if, for every e > there exists a 5 > 0, 

297 such that, for all x, y G £1, \ f{x) — f(y)\ < s provided that — y\\ < 5. An example of a 

298 uniformly continuous function is Ress of (2), for any S G S. This can be proved by observing 

299 that the right-hand side of (2) can be rewritten in vector /matrix form as \\b — Ax\\ and 

300 then selecting, for any given e > 0, 5 to be ej ||A||, where ||A|| denotes the matrix norm of 

301 A. 

302 An operator O : A — > Q, is nonexpansive if ||Oa3 — Oy\\ < \\x — y\\, for all x, y G A. 

303 An example of a nonexpansive operator is the R5 of (3). The proof of this is also simple. 

304 It follows from discussions regarding similar claims in 1 that the Bs w '■ ^ J ^ J °f (4) is a 

305 nonexpansive operator, for 1 < w < W, and that the operator Q of (5) is also nonexpansive. 

306 Obviously, a sequential application of nonexpansive operators results in a nonexpansive 

307 operator and thus R5 is nonexpansive. 

308 Now we state an important new result that gives sufficient conditions for strong perturba- 

309 tion resilience: If P is an algorithm for a problem structure (T, Vr) such that, 

310 for all T G T, P is boundedly convergent for T , Vrx '■ — > K is uniformly 

311 continuous and P^ : A — > Q is nonexpansive, then P is strongly perturbation 

312 resilient . The importance of this result lies in the fact that the rather ordinary condition 

313 of uniform continuity for the proximity function and the reasonable conditions of bounded 

314 convergence and nonexpansiveness of the algorithmic operators guarantee that we end up 

315 with a strongly perturbation resilient algorithm. The proof of this new result involves some 

316 mathematical technicalities and is therefore presented in the Appendix as Theorem 1. 
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317 D. Optimization criterion and nonascending vector 

318 Now suppose, as is indeed the case for the constrained optimization problems discussed 

319 in the previous section, that in addition to a problem structure (T, Vr) we are also provided 

320 with an optimization criterion, which is specified by a function : A — > R, with the 

321 convention that a point in A for which the value of is smaller is considered superior (from 

322 the point of view of our application) to a point in A for which the value of is larger. In the 

323 tomography context, any of the functions of x that are listed as a "secondary optimization 

324 criterion" (an alternative name is a "regularizer") in Section 6.4 of 55 is an acceptable choice 

325 for the optimization criterion 0. These include weighted norms, the negative of Shannon's 

326 entropy and total variation. It is the last of these that we discuss in detail in the illustrative 

327 example below. The essential idea of the superiorization methodology presented in this paper 

328 is to make use of the perturbations of (6) to transform a strongly perturbation resilient 

329 algorithm that seeks a constraints-compatible solution into one whose outputs are equally 

330 good from the point of view of constraints-compatibility, but are superior according to the 

331 optimization criterion. We do this by producing from the algorithm another one, called its 

332 superiorized version, by making sure not only that the 0kV k are bounded perturbations, but 

333 also that (x k + (3 k v k ) < (x k ) , for all k > 0. 

334 In order to ensure this we introduce a new concept (closely related to the concept of a 

335 "descent direction" that is widely used in optimization). Given a function : A — > 1R and a 

336 point x G A, we say that a vector d G R J is nonascending for at x if ||d|| < 1 and 

there is a 5 > such that for all AG [0, S] , 

(7) 

( x + Ad) G A and (x + \d) < (x) . 

337 Note that irrespective of the choices of and x, there is always at least one nonascending 

338 vector d for at x, namely the zero- vector, all of whose components are zero. This is a useful 

339 fact for proving results concerning the guaranteed behavior of our proposed procedures. 

340 However, in order to steer our algorithms toward a point at which the value of is small, 

341 we need to find a d such that (x + Ad) < (x) rather than just (x + Ad) < (x) as in 

342 (7). In some earlier papers on superiorization 2 ' -31 it was assumed that A = IR J and that 

343 is a convex function. This implied that, for any point x G A, had a subgradient g G M J at 

344 the point x. It was suggested that if there is such a g with a positive norm, then d should 

345 be chosen to be —gj otherwise d should be chosen to be the zero vector. However, 
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346 there are approaches (not involving subgradients) to selecting an appropriate d; an example 

347 can be found in in which d is found without using subgradients for the case when <fi is the 

348 fx-norm of the Haar transform. The method we used for selecting a nonascending vector in 

349 the experiments reported in this paper is specified at the end of Subsection III A. 

350 E. Superiorized version of an algorithm 

351 We now make precise the ingredients needed for transforming an algorithm into its su- 

352 periorized version. Let Q and A be the underlying sets for a problem structure (T, Vr) 

353 (Q C A C IR J , as discussed at the beginning of Subsection II B), P be an algorithm for 

354 (T, Vr) and : A — > R. The following description of the Superiorized Version of Algorithm 

355 P produces, for any problem T G T, a sequence Rt = {x k ^_ Q of points in VL for which, for 

356 all k > 0, (6) is satisfied. We show this to be true, for any algorithm P, after the description 

357 of the Superiorized Version of Algorithm P. Furthermore, since the sequence Rt is steered 

358 by Superiorized Version of Algorithm P toward a reduced value of <f), there is an intuitive 

359 expectation that the output of the superiorized version is likely to be superior (from the 

360 point of view of the optimization criterion <fi) to the output of the original unperturbed 

361 algorithm. This last statement is not precise and so it cannot be proved in a mathematical 

362 sense for an arbitrary algorithm P; however, that should not stop us from applying the 

363 easy procedure given below for automatically producing the Superiorized Version of P and 

364 experimentally checking whether it indeed provides us with outputs superior to those of the 

365 original algorithm. The well-demonstrated nature of heuristic optimization approaches is 

366 that they often work in practice even when their performance cannot be guaranteed to be 

367 optimal 33-35 . 

368 Nevertheless, we can push our theory further than the hope expressed in the last para- 

369 graph, by considering superiorized versions of algorithms that satisfy some condition. In 

370 this paper, the condition that we discuss is strong perturbation resilience. We show below 

371 that if P is strongly perturbation resilient, then, for any problem T G T, a sequence Rt 

372 produced by its superiorized version has the following desirable property: For all e G 1R + , if 

373 O (t, e, ^(Pr) fc x j j is defined for every x G Q, then O (T, e 1 , Rt) is also defined for every 

374 e' > e; in other words, the Superiorized Version of Algorithm P provides an e'-compatible 

375 output. As stated above, the advantage of the superiorized version is that its output is 
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376 likely to be superior to the output of the original unperturbed algorithm. We point out that 

377 strong perturbation resilience is a sufficient, but not necessary, condition for guaranteeing 

378 such desirable behavior of the superiorized version, finding additional sufficient conditions 

379 and proving that algorithms that we wish to superiorize satisfy such conditions is part of 

380 our ongoing research. 

381 The superiorized version assumes that we have available a summable sequence (7^)^ of 

382 positive real numbers (for example, 7^ = a £ , where < a < 1) and it generates, simultane- 

383 ously with the sequence (x k ^ =Q , sequences (v k ) c ^ =Q and (/3fc)fclo- The latter is generated as 

384 a subsequence of (7^)^ , resulting in a summable sequence (/3fc)fcLo- ^he algorithm further 

385 depends on a specified initial point x G Q and on a positive integer N. It makes use of a 

386 logical variable called loop. 

387 Superiorized Version of Algorithm P 



388 


(i) set k = 





389 


(ii) set 


x k - 


= X 


390 


(iii) set £ = 


-1 


391 


(iv) repeat 




392 


(v) 


set n = 


393 


(vi) 


set x k ' n = x k 


394 


(vii) 


while n < N 


395 


(viii) 




set v k ' n to be a nonascending vector for 


396 


(ix) 




set loop=true 


397 


(x) 




while loop 


398 


(xi) 




set £ = £ + 1 


399 


(xii) 




set p ktn = 7^ 


400 


(xiii) 




set z = x k > n + (3 k:n v k ' n 


401 


(xiv) 




if z e A and <f) (z) < (x k ) then 
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402 (xv 

403 (xvi 

404 (xvii 

405 (xviii 

406 (xix 



set n = n + 1 



set x k ' n 



set loop = false 
set x k+1 = P T x h > N 
set k = k + 1 



407 Next we analyze the behavior of the Superiorized Version of Algorithm P. 

408 The iteration number k is set to in (i) and x k = x° is set to its initial value x in (ii). The 

409 integer index I for picking the next element from the sequence {^t)'^L is initialized to —1 by 

410 line (iii), it is repeatedly increased by line (xi). The lines (v) - (xix) that follow the repeat 

411 in (iv) perform a complete iterative step from x k to x k+1 , infinite repetitions of such steps 

412 provide the sequence Rt = {x k ) k _ Q . During one iterative step, there is one application of 

413 the operator Pt, in line (xviii), but there are N steering steps aimed at reducing the value of 

414 <f); the latter are done by lines (v) - (xvii). These lines produce a sequence of points 

415 x k > n , where < n < N with x k >° = x k , x k > n G A and (x k ' n ) < <p (x k ) . 

416 We prove the truth of the last sentence by induction on the nonnegative integers. For 

417 n = 0, we have by lines (v) and (vi) that x k, ° = x k . But x k G Q , since it is either x that is 

418 assumed to be in Q due to lines (i) and (ii) or it is in the range Q of Pt due to lines (xviii) 

419 and (xix). Now we assume, for any < n < N, that x k,n G A and (p (x k > n ^j < (f> (x k ^ and 

420 show that lines (viii) - (xvii) perform a computation that leads from x k,n to an x k ' n+1 G A 

421 that satisfies <p (x k ' n+1 ) < <p(x k ). To see this, observe that line (viii) sets v k,n to be a 

422 nonascending vector for at x k,n , which implies that (7) is satisfied with x = x k ' n and 

423 d = v k,n . Line (ix) sets loop to true, and it remains true while searching for the desired 

424 x k,n+l , by repeatedly executing the loop sequence that follows line (x). In this sequence, 

425 line (xi) increases I by 1 and line (xii) sets 0k, n to 7^. Thus for the vector z defined by line 

426 (xiii), z G A and (p (z) < (f> provided that (3k, n is not greater than the 5 in (7). Since 

427 (7^)^ is a summable sequence of positive real numbers, there must be a positive integer L 

428 such that 7^ < 5, for all I > L. This implies that if we applied lines (xi) - (xiii) often enough, 

429 we would reach a vector z that satisfies z G A and (j) (z) < <fr (a3 fc ' n ) . If the condition in line 

430 (xiv) is not satisfied when the process gets to it, then lines (xi) - (xiii) are again executed 

431 and eventually we get a vector z for which the condition in line (xiv) is satisfied due to the 
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432 induction hypothesis that <p (x k ' n ) < <p (aj fc ) . By lines (xv) and (xvi) we see that at that 

433 time x k ' n+1 is set to z and so we obtain that x k,n+1 G A and <p (cc fc ' n+1 ) < <p (aj fc ) , as desired. 

434 Line (xvii) sets loop to false and so control is returned to line (vii). When this happens for 

435 the iVth time, it will be the case that n = N and therefore line (xviii) is used to produce 

436 x k+l G fl and the increasing of k by line (xix) allows us then to move on to the next iterative 

437 step. Infinite repetition of such steps produces the sequence Rt = {x k ) k _ Q of points in f2. 

438 We now show that if O (t,s, ^(P-r) fe x^j J is defined for every x G Q, then, for any 

439 e' > e, the Superiorized Version of Algorithm P produces an e'-compatible output. Since P 

440 is assumed to be strongly perturbation resilient, this desired result follows if we can show 

441 that there exists a summable sequence (/3fc)^L of nonnegative real numbers and a bounded 

442 sequence (u fc )^ of vectors in IR J such that (6) is satisfied for all k > 0. In view of line 

443 (xviii), this is achieved if we can define the f3 k and the v k so that x k ' N = x k + (3 k v k . This 

444 is done by setting 

(5 k = max{/\„|0 < n < N} , (8) 

N-l 
n=0 

445 That these assignments result in x k,N = x k + P k v h follows from lines (v) - (xvii). From line 

446 (xii) follows that (/3fc)jtlo ^ s a subsequence of (7^)^L and, hence, it is a summable sequence 

447 of nonnegative real numbers. Since each ||t> fc ' n || < 1 by the definition of a nonascending 

448 vector, it follows from (8) and (9) that ||f fc || < N and so (v k ^_ Q is bounded. Part of the 

449 condition expressed in (6) is that, for all k > 0, x k + f3 k v k G A. This follows from the fact 

450 that x k ' N = x k + /3 k v k is assigned its value by line (xvi), but only if the condition expressed 

451 in line (xiv) is satisfied. 

452 In conclusion, we have shown that the superiorized version of a strongly perturbation 

453 resilient algorithm produces outputs that are essentially as constraints-compatible as those 

454 produced by the original version of the algorithm. However, due to the repeated steering of 

455 the process by lines (vii) - (xvii) toward reducing the value of the optimization criterion <f), 

456 we can expect that the output of the superiorized version will be superior (from the point 

457 of view of <p) to the output of the original algorithm. 
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458 



F. Information on performance comparison with MAP methods 



459 



Using our notation, the constrained minimization formulation that we are considering is: 



460 Given an e e R. 



minimize <fi(x), subject to Vtt (x) < e. 



(10) 



461 The aim of superiorization is not identical with the aim of constrained minimization in (10). 

462 One difference is that e is not "given" in the superiorization context. The superiorization 

463 of an algorithm produces a sequence and, for any e, the associated output of the algorithm 

464 is considered to be the first x in the sequence for which Vtt (x) < e. The other difference 

465 is that we do not claim that this output is a minimizer of among all points that satisfy 

466 the constraint, but hope only that it is usually an x for which <f)(x) is at the small end 

467 of its range of values over the set of constraint-satisfying points. This latter difference is 

468 generally shared by comparisons of a heuristic approach with an exact approach to solving 

469 a constrained minimization problem. 

470 The MAP (or regularized) formulation of a physical problem that leads to the constrained 

471 minimization problem (10) is the unconstrained minimization problem of the form: Given 

472 a f3 G K+, 



473 Formulations of both kinds (i.e, the ones of (10) and of (11)) are widely used for solving 

474 medical physics problems and the question "Which of these two formulations leads to faster or 

475 better solutions of the underlying physical problem?" is open. Examples of both formulations 

476 with various choices for Vr? and <fi are listed in the beginning parts of the paper of Goldstein 

477 and Osher 11 . 

478 We now return to the question raised near the end of Section I: Will superiorization pro- 

479 duce superior results to those produced by contemporary MAP methods or is it faster than 

480 the better of such methods? As yet, there is very little information available regarding this 

481 general question; in fact, we are aware of only one published study 1 '. That study compared 

482 a superiorization algorithm with the algorithm of Goldstein and Osher that they refer to 

483 as TwIST with split Bregman as the substep, which is indeed a contemporary method 

484 that uses the MAP formulation. (For example, see the discussion of the split Bregman 

485 method in 56 .) The problem S to which the two algorithms were applied was one from the 

486 tomographic problem set § defined in (1). Ress as defined in (2) was used as the proximity 



minimize [<p(x) + (3Vtt (x)] . 



(11) 
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487 function and total variation, TV as defined below in (12), was the choice for 0. It is reported 

488 in 45 that for the outputs of the two algorithms that were being compared, the values of Ress 

489 and TV were very similar, but the superiorization algorithm produced its output four times 

490 faster than the MAP method. 

491 III. AN ILLUSTRATIVE EXAMPLE 

492 A. Application to tomography 

493 We use tomography to refer to the process of reconstructing a function over a Euclidean 

494 space from estimated values of its integrals along lines (that are usually, but not necessarily, 

495 straight). The particular reconstruction processes to which our discussion applies are the 

496 series expansion methods, see Section 6.3 of 55 , in which it is assumed that the function to 

497 be reconstructed can be approximated by a linear combination of a finite number (say J) 

498 of basis functions and the reconstruction task becomes one of estimating the coefficients of 

499 the basis functions in the expansion. Sometimes, prior knowledge about the nature of the 

500 function to be reconstructed allows us to confine the sought-after vector x of coefficients to 

501 a subset Q of IR J (such as the nonnegative orthant We use i to index the lines along 

502 which we integrate, a 1 G 1R" 7 to denote the vector whose jth component is the integral of the 

503 jth basis function along the ith line, and b{ to denote the measured integral of the function 

504 to be reconstructed along the zth line. Under these circumstances the constraints come from 

505 the desire that, for each of the lines, {a % , x) should be close (in some sense) to 6j. 

506 To make this concrete, consider (1). Such a description of the constraints arises in 

507 tomography by grouping the lines of integration into W blocks, with t w lines in the u>th block. 

508 Such groupings often (but not always) are done according to some geometrical condition on 

509 the lines (for example, in case of straight lines, we may decide that all the lines that are 

510 parallel to each other form one block). In this framework the proximity function Res defined 

511 by (2) provides a reasonable measure of the incompatibility of a vector x with the constraints. 

512 The algorithm R described by (3) - (5) is applicable to this concrete formulation. 

513 There are many optimization criteria that have been used in tomography, see Section 

514 6.4 of 55 , here we discuss the one called total variation (TV), whose use has been popular 

515 in medical physics recently, see as examples 20 ' 22 ' 23 ' 41-44 . The definition of TV that we use 



19 



516 here requires a certain way of selecting the basis functions. It is assumed that the function 

517 to be reconstructed is defined in the plane R 2 and is zero-valued outside a square-shaped 

518 region in the plane. This region is subdivided into J smaller equal-sized squares {pixels) 

519 and the J basis functions are defined by having value one in exactly one pixel and value 

520 zero everywhere else. We index the pixels by j and we let C denote the set of all indices of 

521 pixels that are not in the rightmost column or the bottom row of the pixel array. For any 

522 pixel with index j in C, let r(j) and b(j) be the index of the pixel to its right and below it, 

523 respectively. We define TV : 1R J ->■ R by 

TV(x) = V ( x i -Mi)) 2 + i x j- x bU)) 2 - ( 12 ) 

524 The method we adopted to generate a nonascending vector for the TV function at an 

525 x G R J is based on Theorem 2 of the Appendix. It is applicable since TV : R J — > R is a 

526 convex function; see, for example, the end of the Proof of Proposition 1 of . Now consider 

527 an integer j' such that 1 < f < J. Looking at the sum in (12), we see that xy appears in 

528 at most three terms, in which j' must be either j, or r(j), or b(j) for some j G C. By taking 

529 the formal partial derivatives of these three terms, we see that §^-(«c) is well-defined if the 

530 denominator in the formal derivative of any of the three terms is not zero for x. In view of 

531 this, we define the g in Theorem 2 as follows. If the denominator in any of the three formal 

532 partial derivatives with respect to xy has an absolute value less than a very small positive 

533 number (we used 10~ 20 ), then we set gj> to zero, otherwise we set it to §pf(a;). Clearly 

534 the resulting g G R J satisfies the condition in Theorem 2 and hence provides a d that is a 

535 nonascending vector for TV at x. 

536 Previously reported reconstructions using 7V-superiorization selected the d using sub- 

537 gradients as discussed in the paragraph following (7); such a d is not guaranteed to be a 

538 nonascending vector for the TV function. What we are proposing here is not only mathe- 

539 matically rigorous (in the sense that it is guaranteed to produce a nonascending vector for 

540 the TV function), but it can also lead to a better reconstructions, as illustrated in Subsection 

541 HID. 
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542 B. The data generation for the experiments 

543 The data sets used in the experiments reported in this paper were generated in such a 

544 way that they share the noise-characteristics of CT scanners when used for scanning the 

545 human head and brain; as discussed, for example, in Chapter 5 of 55 . They were generated 

546 using the software SNARK09 57 . 

547 The head phantom that was used for data generation is based on an actual cross-section 

548 of the human head. It is described as a collection of geometrical objects (such as ellipses, 

549 triangles and segments of circles) whose combination accurately resembles the anatomical 

550 features of the actual head cross-section. In addition, the basic phantom contains a large 

551 tumor. The actual phantom used was obtained by a random variation of the basic phantom, 

552 by incorporating into it local inhomogeneities and small low-contrast tumors at random 

553 locations. This phantom is represented by the image in figure 1. That image comprises 

554 485 x 485 pixels each of size 0.376 mm by 0.376 mm. The values assigned to the pixels are 

555 obtained by an 11 x 11 sub-sampling of the pixels and averaging the values assigned to the 

556 sub-samples by the geometrical objects that are used to describe the anatomical features 

557 and the tumors. Those values are approximate linear attenuation coefficients per cm at 60 

558 keV (0.416 for bone, 0.210 for brain, 0.207 for cerebrospinal fluid). The contrast of the small 

559 tumors with their background is 0.003 cm" 1 . In order to clearly see the low-contrast details 

560 in the interior of the skull, we use zero (black) to represent the value 0.204 (or anything less) 

561 and 255 (white) to represent 0.21675 or anything more). 

562 For the selected head phantom we generated parallel projection data, in which one view 

563 comprises estimates of integrals through the phantom for a set of 693 equally-spaced parallel 

564 lines with a spacing of 0.0376 cm between them. (We chose to simulate parallel rather 

565 than divergent projection data, since the reconstruction by the method of with which 

566 we wish to compare the superiorization approach were performed for us by the authors 

567 of on parallel data. Even though contemporary CT scanners use divergent projection 

568 data, results obtained by the use of parallel projection data are relevant to them, since it 

569 is known that the quality of reconstructions from these two modes of data collection are 

570 very similar as long as the data generations use similar frequencies of sampling of lines and 

571 similar noise characteristics in the estimated integrals for those lines; see, for example, the 

572 reconstructions from divergent and parallel projection data in figure 5.15 of".) In calculating 
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(a) (b) 

Figure 1: (a) A head phantom, (b) Reconstruction of the head phantom from realistically 
simulated projection data for 360 views using ART with blob basis functions. 



573 these estimates we take into consideration the effects of photon statistics, detector width 

574 and scatter. Details of how we do this exactly can be found in Sections 5.5 and 5.9 of 55 . 

575 Briefly, quantum noise is calculated based on the assumption that approximately 2,000,000 

576 photons enter the head along each ray, detector width is simulated by using 11 sub-rays 

577 along each of which the attenuation is calculated independently and then combined at the 

578 detector, and 5% of the photons get counted not by the detector for the ray in question but 

579 detectors for the neighboring rays. For the experiments in this paper, we did not simulate 

580 the poly-energetic nature of the x-ray source. To indicate what can be achieved in clinical 

581 CT, we show in figure 1(b) a reconstruction that was made from data comprising of 360 

582 such views with the reconstruction algorithm known as ART with blob basis functions; see 55 

583 (Chapter 11). 

584 C. Superiorization reconstruction from a few views 

585 The main reason in the literature for advocating the use of TV as the optimization 

586 criterion is that by doing so one can achieve efficacious reconstructions even from sparsely 
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587 sampled data. In our own work 31 with realistically simulated CT data we found that this is 

588 not always the case and this will be demonstrated again by the experiments reported in the 

589 current paper. 

590 There have appeared in the literature some approaches to TV minimization that seem 

591 to indicate a more efficacious performance for CT than the one reported in 31 . One of these 

592 is the Adaptive Steepest Descent Projections Onto Convex Sets (ASD-POCS) algorithm, 

593 which is described in detail in the much-cited paper of Sidky and Pan 12 and whose use has 

594 been since reported in a number of subsequent publications, for example, in 23 ' . We note 

595 that ASD-POCS was designed with the aim of producing an exact minimization algorithm, 

596 in contrast to our heuristic superiorization approach. Translating equations (6)-(8) of 

597 into our terminology, the aim of ASD-POCS is the following: Given an e G R+, find an 

598 ^-compatible x G Vt = for which TV(x) is minimal. (Note that this aim is a special 

599 case of the constrained optimization formulation presented in (10).) In order to test ASD- 

600 POCS, we generated realistic projection data as described in the previous subsection but 

601 for only 60 views at 3 degree increments with the spacing between the lines for which 

602 integrals are estimated set at 0.752 mm. Thus the number of rays (and hence the number 

603 photons put into the head) in this data set is a twelfth of what it is in the data set used to 

604 produce the reconstruction in figure 1(b). A reconstruction from these data was produced 

605 for us using ASD-POCS by the authors of (this ensured that it does not suffer due to our 

606 misinterpretation of the algorithm or from our inappropriate choices of the free parameters), 

607 it is shown in figure 2(a). 

608 Since the image quality of figure 2(a) is not anywhere near to that of figure 1(b), we present 

609 here a brief discussion as to why we are showing such images. Many publications in the recent 

610 medical imaging literature have claimed that medically-efficacious reconstructions can be 

611 obtained by the use of T^-minimization from data as sparse as what was used to produce 

612 figure 2(a). (In fact, ASD-POCS was motivated and used with such an aim in mind 23,42 ' 43 .) 

613 Such publications usually show reconstructions from sparse data as evidence for the validity 

614 of their claims. They can do this because in their presented illustrations the features that 

615 are observable in the reconstructions are usually much larger and/or of much higher contrast 

616 against their backgrounds than the small "tumors" in figure 1(a), which are perfectly visible 

617 in the reconstruction in figure 1(b), but are not detectable in the reconstruction from sparse 

618 data in figure 2(a). The reason why that reconstruction appears to be unacceptably bad is 
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(a) (b) 

Figure 2: Reconstructions using TV as the optimization criterion from realistically 
simulated projection data for 60 views using (a) ASD-POCS and (b) superiorization. As 
compared to figure 1(b), these reconstructions fail in two ways: they do not show some of 
the fine details in the phantom and they present some artifactual variations. The former of 
these is a consequence of reconstructing from a much smaller data set than used for figure 
1(b). The latter is due to using a very narrow window (13.5 HU) in these displays. Were 
we to use a wider display window (e.g., from -429 HU to 429 HU) for the reconstructions 
in this figure and in figure 1(b), the visual appearance of the resulting images would be 

nearly indistinguishable. 

619 that the display window (from 0.204 cm" 1 linear attenuation coefficient to 0.21675 cm" 1 linear 

620 attenuation coefficient) is very narrow; it was selected to enhance the visibility of the small 

621 low-contrast tumors. The width of this window corresponds to about 13.5 Hounsfield Units 

622 (HU). As compared to this, in their evaluation of sparse- view reconstruction from flat-panel- 

623 detector cone-beam CT, Bian et al. 1 ! use what they call a "soft-tissue grayscale window" 

624 (also a "narrow window") from -429 HU to 429 HU to display head phantom reconstructions. 

625 Using such a window for our reconstructions shown figures 2(a) and 1(b) would result in 

626 images that are nearly indistinguishable from each other. Thus reporting the images using 

627 such a display window is consistent with the claim that a TV-minimizing reconstruction 
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628 from a few views is similar in quality to a more traditional reconstruction from many views. 

629 However, our much narrower display window reveals that this is not really so. We therefore 

630 continue using our much narrower window in what follows, since it clearly reveals the nature 

631 of the reconstructions being compared, warts and all. 

632 While this ASD-POCS reconstruction is not as good as it should be for diagnostic CT of 

633 the brain (due to the sparsity of the data), it is visually better than the reconstruction using 

634 superiorization from similar data as reported in 31 . We discuss the reasons for this in the 

635 next subsection. Here we concentrate on examining whether one can achieve a reconstruction 

636 using superiorization that is as good as that produced by ASD-POCS from the same data. 

637 For this we first need to examine the numerical properties of the ASD-POCS reconstruc- 

638 tion. This reconstruction uses 485 x 485 pixels each of size 0.376 mm by 0.376 mm. This 

639 implies that J = 235,225 and it also determines the components of the vectors a 1 e R J in 

640 the precise specification of the problem S. The Ress, as defined by (2), of the ASD-POCS 

641 reconstruction is 0.33 and the TV, as defined by (12), is 835. 

642 We applied to the same problem S a superiorized version of the algorithm R defined 

643 by (3). To complete the specification of R, we point out that for the ordering of views we 

644 chose the "efficient" one that was introduced in 58 and is also discussed on page 209 of 55 . 

645 The choices we made for the superiorization are the following: 7^ = 0.99995^, x is the zero 

646 vector and iV = 20. The nonascending vector was computed by the method described in the 

647 paragraph below (12). Denoting by R$ the infinite sequence of points in Q that is produced 

648 by the superiorized version of the algorithm R when applied to the problem 5*, we chose as 

649 our reconstruction x* = O (S, 0.33, Rs)- For such a reconstruction we have, by the definition 

650 of O, that Res,s {x*) < 0.33; in other words, the output of the superiorization algorithm is 

651 at least as constraints-compatible with 5* as the output of ASD-POCS. From the point of 

652 view of Ty-minimization, our x* is slightly better: TV (x*) =826. 

653 The superiorization reconstruction is displayed in figure 2(b). Visually it is similar to the 

654 reconstruction produced by ASD-POCS. From the optimization point of view it achieves the 

655 desired aim better than ASD-POCS does, since it results in smaller values for both Ress 

656 and for TV, even though only slightly. 

657 That the two reconstructions in figure 2 are very similar is not surprising because a 

658 comparison of the pseudo-codes reveals that the ASD-POCS algorithm in 42 is essentially a 

659 special case of the Superiorized Version of Algorithm P, even though it has been derived 
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660 from rather different principles. To obtain the ASD-POCS algorithm from our methodology 

661 described here, we would have to choose an Algebraic Reconstruction Technique (ART; 

662 see Chapter 11 of") as the algorithm that we are superiorizing. Such a superiorization of 

663 ART was reported in the earliest paper on superiorization J ' . For the illustration in our 

664 current paper we decided to superiorize the block-iterative algorithm R defined by (3). 

665 This illustrates the generality of the superiorization approach: it is applicable not only to 

666 a large class of constrained optimization problems, but also enables the use of any of a 

667 large class of iterative algorithms designed to produce a constraints-compatible solutions. 

668 A recent publication aimed at producing an exact TV-minimizing algorithm based on the 

669 block-iterative approach is . 

670 D. Effects of variations in the reconstruction approach 

671 The reconstruction in figure 2(a) produced by ASD-POCS definitely "looks better" than 

672 a reconstruction in 31 , which was obtained using superiorization from similar data. Since, as 

673 discussed in the last paragraph of the previous subsection, the ASD-POCS algorithm in 

674 can be obtained as a special case of superiorization, it must be that some of the choices made 

675 in the details of the implementations are responsible for the visual differences. An analysis 

676 of the implementational details adopted by the two approaches revealed several differences. 

677 After removing these differences, the superiorization approach produced the image in figure 

678 2(b), which is very similar to the reconstruction produced by ASD-POCS. We now list the 

679 implementational choices that were made for superiorization to make its performance match 

680 that of the reported implementation of ASD-POCS. 

681 One implementational difference is in the stopping-rule of the iterative algorithm; that 

682 is, the choice of e in determining the output O (S,e, Rs)- Since the data are noisy, the 

683 phantom itself does not match the data exactly. In previously reported implementations of 

684 superiorization it was assumed that the iterative process should terminate when an image 

685 is obtained that is approximately as constraints-compatible as the phantom; in the case of 

686 the phantom and the projections data on which we report here the value of Res$ for the 

687 phantom is approximately 0.91, which is larger than its value (0.33) for the reconstruction 

688 produced by ASD-POCS. The output O (S, 0.91, Rs) is shown in figure 3(a). This is a 

689 wonderfully smooth reconstruction, its TV value is only 771. However this smoothness 
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690 comes at a price: we loose not only the ability to detect the large tumor, but we cannot 

691 even see anatomic features (such as the ventricular cavities) inside the brain. So it appears 

692 that, in order to see medically-relevant features in the brain, over-fitting (in the sense of 

693 producing a reconstruction from noisy data that is more constraints-compatible than the 

694 phantom) is desirable. 

695 In the implementations that produced previously reported reconstructions by superior- 

696 ization, the number N in the Superiorized Version of Algorithm P was always chosen to 

697 be 1. It is possible that this is the wrong choice, making only this change to what lead to 

698 the reconstruction in figure 2(b) results in the reconstruction shown in figure 3(b). That 

699 image appears similar to the image in figure 2(b), but it has a higher TV value, namely 832, 

700 which is still very slightly lower than that of the ASD-POCS reconstruction. The choice 

701 N = 20 was based on the desire to maintain consistency with what has been practiced using 

702 ASD-POCS, see page 4790 of 12 . It appears that in the context of our paper the additional 

703 computing cost due to choosing N to be 20 rather than 1 is not really justified. (We note 

704 that if d is selected using subgradients as discussed in the paragraph following (7) and thus 

705 d is not guaranteed to be a nonascending vector for the TV function, then the choice of 

706 20 rather than 1 for N results in a considerable improvement. However, an even greater 

707 improvement is achieved even with N = 1 by selecting d as recommended in this paper.) 

708 Another important difference between the ASD-POCS implementation and the previous 

709 implementations of the superiorization approach is the size of the pixels in the reconstruc- 

710 tions. For the ASD-POCS reconstruction this was selected to be 0.376 mm by 0.376 mm. 

711 In previously reported reconstructions by superiorization it was assumed that the edge of 

712 a pixel should be the same as the distance between the parallel lines along which the data 

713 are collected; that is, 0.752 mm for our problem S. This assumption proved to be false. 

714 T\^-minimization takes care of undesirable artifacts that may otherwise arise due to the 

715 smaller pixels and this leads to a visual improvement. A superiorizing reconstruction with 

716 the larger pixels, using e = 0.33 and N = 20, is shown in figure 3(c). (We note that the use 

717 of smaller pixels during iterative x-ray CT reconstructions was also suggested in 59 . How- 

718 ever, that approach is quite different from what is presented here: its final result uses larger 

719 pixels whose values are obtained by averaging assemblies of values provided by the iterative 

720 process to the smaller pixels. There is no such downsampling in our approach, our final 

721 result is presented using the smaller pixels. Its smoothness is due to reduction of TV by the 
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(c) 



Figure 3: Reconstructions produced by varying some of the parameters in the algorithm 
that produced figure 2(b). (a) Changing the termination criterion form e = 0.33 to 
e = 0.91. (b) Changing the value of N from 20 to 1. (c) Reconstructing with pixel size 
0.752 mm by 0.752 mm instead of 0.376 mm by 0.376 mm. (d) Reconstructing with all the 

three changes of (a)-(c). 
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722 superiorization approach rather than to averaging pixel values in a denser digitization.) 

723 Combining the use of the larger pixels with e = 0.91 and N — 1 results in the reconstruc- 

724 tion shown in figure 3(d). This reconstruction, for which the superiorization options were 

725 selected according to what was done in , is visually inferior to those shown in our figure 

726 2. The reconstructions displayed in figure 3 also illustrate another important point, namely 

727 that even though the mathematical results discussed in this paper are valid for a large range 

728 of choices of the parameters in the superiorization algorithms, for medical efficacy of the 

729 reconstructions attention has to be paid to these choices since they can have a drastic effect 

730 on the quality of the reconstruction. 

731 It has been mentioned in Subsection II B that except for the presence of Q in (3), which 

732 enforces nonnegativity of the components, R is identical to the algorithm used and illustrated 

733 in . It is known that CT reconstruction of the brain from many views does not suffer 

734 from ignoring the fact that the components of the x, which represent linear attenuation 

735 coefficients, should be nonnegative; as is illustrated in figure 1(b). This remains so when 

736 reconstructing from a few views using the method and data that we have been discussing: 

737 if we do everything in exactly the same way as was done to obtain the reconstruction with 

738 TV value 826 that is shown in our figure 2(b) but remove Q from (3), then we obtain a 

739 reconstruction in figure 4(a) whose TV value is 829. 

740 Another variation that deserves discussion, because it has been suggested in the 

741 literature 22 , is one that does not come about by making choices for the general approach of 

742 the Superiorized Version of Algorithm P but rather by changing the nature of the approach. 

743 The variation in question is not applicable in general, but can be applied to the special 

744 case when the algorithm to be superiorized is the R defined by (3). It was suggested as 

745 an improvement to the approach presented above with the choice N — 1. The idea was 

746 based on recognizing the block-iterative nature of the algorithmic operator R5 in (3) and 

747 intermingling the perturbation steps of lines (vii)-(xvii) of the Superiorized Version of Al- 

748 gorithm R with the projection steps B^, . . . , Bs w of (3). It was reported in 22 that doing 

749 this is advantageous to using the Superiorized Version of Algorithm R. However, when we 

750 applied the variation of the Superiorized Version of Algorithm R that is proposed in 22 to 

751 the problem S that we have been using in this section, we ended up with the reconstruction 

752 in figure 4(b) whose TV value is 920. This is not as good as what was obtained using the 

753 version of the algorithm that produced the reconstruction in figure 2(b). We conclude that 
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(a) (b) 



Figure 4: Reconstructions by variations that do not fit into the framework within which 
the previously shown reconstructions were produced, (a) Not using nonnegativity in the 
algorithm, (b) Interleaving perturbations with blocks. 

754 the variation suggested by 22 , which does not fit into the theory of our paper, does not have 

755 an advantage over what we are proposing here, at least for the problem S that we have 

756 been discussing in this section. We conjecture that the improvement reported in 22 is due to 

757 selecting d using subgradients as discussed in the paragraph following (7) and, as discussed 

758 earlier, such an improvement is not obtained if d is selected by the more appropriate method 

759 recommended in this paper. 

760 IV. DISCUSSION AND CONCLUSIONS 

761 Constrained optimization is an often-used tool in medical physics. The methodology of 

762 superiorization is a heuristic (as opposed to exact) approach to constrained optimization. 

763 Although the idea of superiorization was introduced in 2007 and its practical use has been 

764 demonstrated in several publications since, this paper is the first to provide a solid math- 

765 ematical foundation to superiorization as applied to the noisy problems of the real world. 

766 These foundations include a precise definition of constraints-compatibility, the concept of a 
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767 strongly perturbation resilient algorithm, simple conditions that ensure that an algorithm 

768 is strongly perturbation resilient, the superiorized version of an algorithm and the showing 

769 that the superiorized version of a strongly perturbation resilient algorithm produces outputs 

770 that are essentially as constraints-compatible as those produced by the original version but 

771 are likely to have a smaller value of the chosen optimization criterion. 

772 The approach is very general. For any iterative algorithm P and for any optimization 

773 criterion (f) for which we know how to produce nonascending vectors, the pseudocode given 

774 in Subsection II E automatically provides the version of P that is superiorized for (f). 

775 We demonstrated superiorization for tomography when total variation is used as the 

776 optimization criterion. In particular, we illustrated on a particular tomography problem 

777 that, in spite of its generality, superiorization produced a reconstruction that is as good 

778 as (from the points of view of constraints-compatibility and T\^-minimization) what was 

779 obtained by the ASD-POCS algorithm that was specially designed for Ty-minimization in 

780 tomography. 
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789 Appendix 

790 Conditions for strong perturbation resilience 

791 Theorem 1. Let P be an algorithm for a problem structure (T, Vr) such that, for all 

792 T G T, P is boundedly convergent for T, Vtt '■ — > K is uniformly continuous and 

793 Pt '■ A — > Q is nonexpansive. Then P is strongly perturbation resilient. 
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794 



Proof. We first show that there exists an e G IR+ such that O \ T,e 



;pr)V 



k=0/ 



795 is defined for every x G Vt. Under the assumptions of the theorem, let 7 G K+ be such 

796 that Vr-r (y (x)) < 7, for every x G Q. We prove that O ^T, 27, ^(Pr) fc :rj j is defined 

797 for every x G f2 as follows. Select a particular x G fi. By uniform continuity of Vtt, 

798 there exists a 5 > 0, such that \Ptt (z) — VrT{y(x))\ < 7, for any z G Q for which 

799 || z — y {x)\\ < 5. Since P is convergent for T, there exists a nonnegative integer K, such 



800 that 



P T ) K x-y(x) 



< 5. It follows that 



< 



Pr T ((P r ) X - Vr T (y (x)) | + |Pr r (y (aj)) 



;p T )^ 



fc=0 



(13) 

is defined for every 



Pr T ((P T )V 

< 27. 

801 Now let T G T and e G R+ be such that O (V, e, 

802 a; G Vt. To prove the theorem, we need to show that O (T, e', R) is defined for every e' > e 

803 and for every sequence R = (x k ^_ Q of points in f2 for which, for all k > 0, (6) is satisfied for 

804 bounded perturbations Pk vk ■ Let e' and R satisfy the conditions of the previous sentence. 

For k > 0, we have, due to the nonexpansiveness of Pt, that 

1 1 x k+1 - P T x k || = ||P T (x k + p k v k ) - P T x k \\<\\f3 k v k \\. (14) 

805 Denote ||/3fci' fc || by r k . Clearly, r k G 1R + and it follows from the definition of bounded 

00 

806 perturbations that ^V/c < 00. 

807 We next prove by induction that, for every pair of nonnegative integers k and i, 

k+i-l 



x k+l -(P T Yx 



< 



E 

j=k 



(15) 



808 Let k be an arbitrary nonnegative integer. If i — 0, then the value is zero on both sides of 

809 the inequality and hence (15) holds. Now assume that (15) holds for an integer i > 0. Then, 

810 by (14) and the nonexpansiveness of P^, 



x 



k+i+l 



(Pt) 



1+1 x k 



< ||a.*+<+i _ p T x k+i \\ 

P T x k+i - {P T ) t+l x k 
r k+l + \\x k+i -{P T ) l x k 



< 



\\x 

< r k+i + ^ 

j=k 

k+i 

= J2 r i> 

j=k 



(16) 
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811 which completes our inductive proof. A consequence of (15) is that, for every pair of non- 
812 negative integers k and i, 



x 



k+i 



-(p r )Vj|<x> 



(17) 



j=k 



813 Due to the summability of the nonnegative sequence (rfc)?l , the right-hand side (and hence 

814 the left-hand side) of this inequality gets arbitrarily close to zero as k increases. 

815 Since Vr? is uniformly continuous, there exists a 5 such that, for all x,y £ Q, 

816 \Vtt{x) — VrT{y)\ < e' — e provided that \\x — y\\ < 5. Select a A; so that YlT=k r i — 



817 By the assumption that O [T,e 

818 nonnegative integer i for which Vr 



[P T ) k x 



k=0 



V T Yx k 



is defined for every x £ Q, there exists a 
< e. From (17) we have, for this k and i, 



819 that 



x k+i _ (p T y x 



1 n^k 



< 5 and, hence, 



|7Vr(£c 



k+i\ 



< 



Vr T (x k+i ) -Vr T ( (P T yx k 



+ 



Vr T (CP T Yx k 



(18) 



< (e'-e)+e = e' ) 
820 proving that O (T, e', R) is defined. □ 
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Nonascending vectors for convex functions 



822 



824 



826 



828 



Theorem 2. Let 



be a convex function and let x £ M J . Let gr £ IR J satisfy the 



823 property: For 1< j < J, if the j'th component ^ of g is not zero, then the partial derivative 
(x) of 4> at a? exists and its value is ^. Define d to be the zero vector if ||gr|| = and to 



825 be —gj ||gr|| otherwise. Then d is a nonascending vector for (f) at a;. 



Proof. The theorem is trivially true if ||gr|| = 0, so we assume that this is not the case. 



827 We denote by / the nonempty set of those indices j for which gj ^ 0. 



For 1 < j < J, let Sj be Sj/\ gj \ for j £ I and be otherwise, and let e? £ M J be the vector 



829 all of whose components are zero except for the jth, which is one. Then, for I < j < J, 



830 there exists a 5j > such that, for < Xj < 5j, 



(x — XjSje j ) < <p (x) 



(19) 



831 This is obvious if s. 



0. Otherwise, ^fr(#) exists and indicates <p increases at x if Sj 



832 or that 6 decreases at x if s. 



T. The existence of the desired 5j can be derived from the 



833 standard definition of the partial derivative as a limit. 
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834 We define 5 > by 

a U*ll • J s i \ 
o = mm < > . 

835 Then we have that, for < A < 5, 




J 3=1 

= <P(x). 

836 The first inequality above follows from the convexity of and the second one follows from 

837 (19), with Xj defined to be AjjgJ, combined with (20). Thus d is a nonascending vector for 

838 at x. □ 
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